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Abstract 


Discrete-event  simulation  is  one  of  the  most  important  techniques  available  for  study¬ 
ing  complex  stochastic  systems.  In  this  paper  we  review  the  principal  methods  available 
for  analyzing  both  the  transient  and  steady-state  simulation  problems  in  sequential  eind 
parallel  computing  environments.  Next  we  discuss  several  of  the  variance  reduction  meth¬ 
ods  designed  to  make  simulations  nm  more  efficiently.  Finally,  a  short  discussion  is  given 
of  the  methods  available  to  study  system  optimization  using  simulation.  ~  , 


V 


Keywords-,  stochastic  simulation,  output  analysis,''.variance  reduction,  parallel  computa- 


system  optimization. 


AccesiOft  For 

1 

^'TIS  CRA&I 

y 

DTIC  TAB 

□ 

Unannounced 

□ 

Justification 

By 

Distribution  / 

Avaiidbility 

Codes 

Avdii  .md/or 


1.  Introduction. 

Computer  simulation  of  complex  stochastic  systems  is  an  important  technique  for 
evzduating  system  performance.  The  starting  point  for  this  method  is  to  formulate  the 
time  varying  behavior  of  the  system  as  a  basic  stochastic  process  Y  =  {¥(1)  :  t  >  0}, 
where  y(-)  may  be  vector- valued.  [Discrete  time  processes  cam  also  be  handled.]  Next 
a  computer  prograim  is  written  to  generate  sample  realizations  of  Y .  Simulation  output 
is  then  obtained  by  running  this  program.  Our  discussion  in  this  paper  is  centered  on 
the  anadysis  of  this  simulation  output.  The  goal  being  to  develop  sound  probabilistic  auid 
statistical  methods  for  estimating  system  performance. 

Two  principal  problems  arise:  the  transient  simulation  problem  and  the  steady-state 
simulation  problem.  Let  T  denote  a  stopping  time  and  X  =  :  0  <  t  <  T},  where  h 

is  a  given  real- valued  fvmction.  The  transient  problem  is  to  estimate  a  =  Examples 

of  a  include  the  following: 

a  =  E{f{Yito)}, 

and 

a  =  P{Y  does  not  enter  A  before  <o}- 

Here  to  is  a  fixed  time  (>  0),  /  is  a  given  real-valued  function,  and  A  is  a  given  subset  of 
the  state-spaice  of  Y.  The  transient  problem  is  relevant  for  systems  running  for  a  limited 
(but  possibly  random)  length  of  time  that  camnot  be  expected  to  reach  a  steawiy-state.  Our 
goal  here  is  to  provide  both  point  and  interval  estimates  for  a. 

For  the  steady-state  problem  we  assume  the  Y  process  is  asymptotically  stationary 
in  the  sense  that 

\  [  nY(s))d,  ^  a 

as  t  -+  oo.  Here  =>  denotes  weak  convergence  and  /  is  a  given  real- valued  function 
defined  on  the  state-spau:e  of  Y.  The  easiest  example  to  think  about  here  is  an  irreducible, 
positive  recurrent,  continuous  time  Marrkov  chain.  In  this  case  Y{t)  =>■  Y  as  t  — »  oo  and 
a  =  E{f{Y)).  Examples  of  a  in  this  case  include  the  following; 

a  =  E{Y^}  (when  K  is  re  ad- valued). 
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a  =  P{y  €  A}, 


and 

a  =  E{c{Y)), 

where  c  is  a  given  cost  function.  Again  as  in  the  transient  case,  we  wish  to  construct  both 
point  and  interval  estimates  for  a. 


2.  Transient  Problem. 

Assume  we  have  a  computational  budget  of  f  time  units  with  which  to  simulate  the 
process  Y  and  estimate  o  =  E{X},  as  defined  in  Section  1.  In  a  sequential  computing 
environment  we  would  generate  independent,  identically  distributed  (iid)  copies 

(Xi,ri),(X2,r2),-.., 


where  the  Xi's  are  copies  of  X  and  Tj  is  the  computer  time  required  to  generate  X,.  Let 
N{t)  denote  the  number  of  copies  of  X  generated  in  time  t;  this  is  just  the  renewal  process 
aissociated  with  the  iid  r^’s.  A  natural  point  estimator  for  a  is 

(  N(t) 


N{t)  >  0 

\.)  =  0. 


The  standard  eisymptotic  results  for  Xs{t)  are  the  strong  law  of  large  numbers  (SLLN) 
and  the  central  limit  theorem  (CLT). 


STRONG  LAW  OF  LARGE  NUMBERS.  If  E{ri}  <  oo  and  E{\  Xi  |}  <  oo,  then 
Xs{t)  —*  «  B8  t  —*  oo. 

CENTRAL  LIMIT  THEOREM.  If  E{n}  <  oo  and  var{A’i}  <  oo,  then 

-  o|  =>  (£{n}  ■  1). 

where  iV(0,l)  is  a  mean  zero,  vsuiance  one  normal  reindom  variable.  The  SLLN  follows 
from  the  SLLN  for  iid  summands  and  the  SLLN  for  renewal  processes.  The  CLT  result 
can  be  found  in  BILLINGSLEY  (1968),  Section  17. 
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From  the  SLLN  we  see  that  XjV(t)  is  a  strongly  consistent  point  estimator  for  a.  Thus 
for  large  t  we  would  use  as  our  point  estimate.  On  the  other  hzind,  the  CLT  can  be 

used  in  the  standard  manner  to  construct  a  confidence  interval  for  a.  Here  the  constant 
£{ri}  •  uar{,Y,}  appearing  in  the  CLT  would  have  to  be  estimated. 

Suppose  now  that  we  atre  in  a  parallel  computing  environment  with  p  independent 

processors.  Now  we  wish  to  estimate  a  for  a  fixed  t  as  p  —*  oo.  On  the  p  processors  we 

generate  iid  copies  of  (JY,  r): 

(■Yn,  I’ll),  (-Yi2,ti2)  ,•••,  Tia/i(o) 

(-Y21, 7-2i),  (X22,  7'22)  ,•••,  (-Y2Nj(t)' ’’2Nj(l)) 


(-Ypl,  Tpi),  (A^p2, '^p2)  (■Yp.Vp(t), 

A  number  of  estimators  have  been  proposed  for  estimating  a  =  E{X}.  The  most  natural 
estimator  to  consider  first  is  that  obtained  by  averaging  the  realizations  of  JY  across  each 
processor  and  then  averaging  these  sample  means.  This  leads  to 

oiiiPit)  =  "Z 

^  i=I 


where 


I  0  ,  iV,(<)  =  0. 

Here  the  processing  ends  on  all  processors  at  time  Tp  =  t.  If  <  00  and  E{\  JY  [}  <  oc, 


then  for  8l11  t  >  0 


«i(p,0  E{Xtf(^t)}  =  E{X  •  a 


as  p  —*  00.  Here  1^  is  the  indicator  function  of  the  set  A.  Unfortunately,  E{X}  ^ 
EfA”  •  and  so  Qi(p,t)  is  not  strongly  consistent  for  a  as  p  — »  00. 

The  next  estimator  for  a  was  proposed  by  HEIDELBERGER  (1987).  For  this  esti¬ 
mator  we  let  aU  processors  complete  the  replication  in  process  at  time  t.  The  estimator 


P  N.(0+1 

E  E 


Ei^-c) + 


4 


Here  all  processors  complete  by  time 


Tp  =  max  [r.i  +  ri2  +  •  •  •  + 

1  <  1  vp 

Unfortunately,  Tp  —*  +00  a.s.  as  p  —*  00.  However,  a2(p,<)  is  strongly  consistent  for  a. 
To  see  this,  note  that  if  Z'd  -Y  |}  <00  and  P{t  >  0}  >  0,  then  as  p  — >  00 

(  NdtHi 


Ei 


a2(p,t) 


j=i 


=  F{X}  0.3. 


The  equzdity  about  is  simply  Wald’s  equation.  Finally,  since  a2(p,  t)  is  a  ratio  estimator, 
a  CLT  is  also  available  from  which  a  confidence  interval  can  be  constructed. 

The  last  estimator  we  consider  was  proposed  by  HEIDELBERGER  and  GLYNN 
(1987).  Here  we  set 

p 


^  i=l 


where 


Given  N{t)  >  1,  Heidelberger  and  Glynn  show  that  the  pairs  of  random  variables  (Xi,  ri), 
. . . ,  {Xff(t),  T;v(|))  are  exchangeable.  Using  this  fact,  they  prove  that  =  E{Xi}. 

Since  the  JY^^^^j’s  are  iid,  we  see  that  a3(t)  is  strongly  consistent  for  a  =  E{Xi}.  Since 
the  summands  in  a3(p,  <)  are  iid,  the  standard  CLT  holds  (under  appropriate  variance 
assumptions)  and  can  be  used  to  develop  a  confidence  interval  for  a.  Note  that  the 
definition  of  reqviires  the  tth  processor  to  complete  the  replication  in  process  at 

time  t,  if  no  observations  have  been  completed  by  time  t;  i.e.,  th  >  t.  Thus  the  completion 
time  for  all  p  processors  is  given  by 


Tp  =  max  {max(t,  r,i)}. 

l<i<p 

While  Tp  -*  00  0.3.  as  p  — >  00  (if  P{rii  >  t)  >  0),  Tp  goes  to  infinity  at  a  much  slower 
rate  than  is  the  case  for  a2(p,  i).  They  also  show  that  the  following  CLT  holds: 

-  “1  => m  1) 
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as  t  oc.  where  we  assume  0  <  cr^  =  yar{jVi}  <  oo  and  0  <  E{ti}  <  cc.  Thus  Xs{t) 
can  cdso  be  used  in  a  sequential  environment  to  estimate  a. 


3.  Steady-State  Problem. 

The  steady-state  estimation  problem  is  considerably  more  difficult  than  the  transient 
estimation  problem.  This  difficulty  stems  from  the  following  considerations:  (i)  need  to 
estimate  long-r\m  system  behavior  from  a  finite  length  simulation  run;  (ii)  an  initial  bias 
(or  transient)  usually  is  present  since  the  process  being  simulated  is  non-stationary;  and 
(iii)  strong  autocorrelations  are  usually  present  in  the  process  being  simulated.  While 
classical  statistical  methods  can  often  be  used  for  the  transient  estimation  problem,  these 
methods  generally  fail  for  the  steady-state  estimation  problem  for  the  reasons  mentioned 
above. 

Assume  our  simulation  output  process  is  Y  ~  {Y{t)  :  t  >  0}  and  for  a  given  real- 
\-alued  function  / 

Q{t)  =  ~  f[Y{s)]ds  =>  a.  (1) 

As  stated  above,  we  wish  to  construct  point  and  interval  estimators  for  a.  In  addition  to 
(1),  many  methods  also  assume  that  a  positive  constant  a  exists  such  that  the  following 
CLT  holds: 

v^[a(t)-a]=><T-iV(0,l)  (2) 

as  t  — »  00.  From  (1)  and  (2)  we  can  construct  a  point  estimate  and  confidence  interval  for 
a  provided  we  can  estimate  <7.  Estimating  a  is  generally  the  hardest  problem. 

A  variety  of  methods  have  been  developed  to  awidress  the  steady-state  estimation 
problem.  In  Figure  1  we  have  given  a  break-down  of  these  methods.  Most  of  the  methods 
are  single  replicate  methods,  since  mxiltiple  replicate  methods  tend  to  be  inefficient  because 
of  the  initial  bias  problem. 

Here  we  only  consider  single  replicate  methods.  These  methods  are  of  two  types: 
those  that  consistently  estimate  a  and  those  in  which  a  is  cancelled  out. 

For  consistent  estimation  of  <7,  we  need  a  process  {s(t)  :  t  >  0}  such  that  s{t)  =>•  a. 
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STEADY- STATE 
CONFIDENCE  INTERVAL 
GENERATION  METHODS 


Figure  1 
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In  which  case  (2)  leauis  to  a  100(1  —  6)  %  confidence  interval  for  a  given  by 

la(t)  -  z{l  -  +  z{l  -  6/2)sit)/t^/% 

where  $(z(l  —  S/2))  =  1  —  S/2  and  $  is  the  standard  normal  distribution  function. 

On  the  other  hand,  the  canceling  out  methods  require  a  non- vanishing  process  {Z{t)  : 
t  >  0}  such  that 

[t^'\a{t)  -  a),Zit)]  ^  [<TN{0,l),aZ] 

as  t  —*  oo.  Then  using  the  continuous  mapping  theorem  (cf.,  BILLINGSLEY  (1968),  p. 
30)  we  have 

t^/^ia(t)-a)/Z(t)=^NiO,l)/Z  (3) 

as  t  —*  oo.  Note  from  (3)  that  cr  has  been  cancelled  out  in  a  manner  reminiscent  of  the 
t-statistic. 

First  we  discuss  one  of  the  methods  in  which  <t  is  consistently  estimated,  namely, 
the  regenerative  method;  see  IGLEHART  (1978)  for  a  discussion  of  this  method  plus 
other  background  material.  Here  we  assume  that  the  simulation  output  process  V  is  a 
regenerative  process.  We  are  given  a  real-valued  function  /  smd  wish  to  estimate  a{f)  = 
E{f{Y)},  where  Y{t)  ^  Y  as  t  — ►  oo.  Again  it  is  convenient  to  think  of  V  as  an 
irreducible,  positive  recurrent,  continuous  time  Markov  chsun.  Let  T(0)  =  0,Ti,  T2, ...  be 
the  regeneration  times  for  Y  and  set  Ti  =  Ti  —  >  1.  The  r/s  are  the  lengths  of 

the  regenerative  cycles.  Next  define  the  areas  imder  the  Y  process  in  the  fcth  regenerative 
cycle  by 

n(/)  =  r  f[Y{s)\d,. 

The  following  basic  facts  provide  the  foundation  for  the  regenerative  method: 

(i)  the  pairs  {(Fk(/),r*)  :  fc  >  1}  are  iid; 

(ii)  if  £{|  ffX)  1}  <  00,  then  a(/)  =  £{y',(/))/£{n}. 

The  regenerative  method  can  be  developed  on  either  the  intrinsic  time  scale  (t)  or  on  the 
random  time  scale  (n)  corresponding  to  the  number  of  regenerative  cycles  simulated.  On 
the  intrinsic  time  scale  our  point  estimate  for  a  is  given  by 

=  )  J‘  nY($))ds, 
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where  t  is  the  length  of  time  the  simulation  is  run.  On  the  random  time  scale  our  point  is 
given  by 

a„(/)  =  F„(/)/f„, 

where  Yn{f)  (respectively,  f„)  is  the  sample  mean  of  Y\if) . Ynif)  (^i . ’’’n)-  Here  the 

Y  process  is  simulated  to  the  completion  of  n  regenerative  cycles.  Using  the  basic  facts 
(i)  and  (ii)  above,  it  cem  be  shown  that  both  a{t,f)  and  Qnif)  are  strongly  consistent  for 
q(/)  as  t  and  n  respectively  tend  to  infinity.  Next  we  define  Z*  =  Yk(f)  —  a(/)ri  and 
assume  that  var{Zk)  =  <7^  <  oo.  Then  it  can  be  shown  that  the  following  two  CLT's  hold 
as  t  — >  oo  and  n  — ►  oo: 


-  aif)]  ^  (a/E*/2{ri})lV(0, 1), 

and 

n‘''[o~(/)  -  a(/)l  =►  (<7/f:{T,))JV(0, 1). 

These  two  CLT’s  can  then  be  used  to  construct  confidence  intervals  for  a(/)  provided  both 
(7^  and  ^{ri}  can  be  estimated.  The  mean  £{ti}  is  easily  estimated  by  f„  emd  can  be 
estimated  from  its  definition  in  terms  of  Yi{f)  and  ri. 

Next  we  turn  to  a  discussion  of  the  principal  method  avrulable  for  camceling  out  a. 
This  is  the  method  of  stamdardized  time  series  developed  by  SCHRUBEN  (1983).  Our 
discussion  is  based  on  the  paper  GLYNN  and  IGLEHART  (1989)  amd  uses  some  results 
from  weak  convergence  theory;  see  BILLINGSLEY  (1968)  for  background  on  this  theory. 
Ftom  our  output  process  Y  we  form  the  random  elements  of  C[0, 1],  the  space  of  real-valued 
continuous  functions  on  the  interval  [0, 1],  given  by 

Ynit)  =  -  /  Y{3)d3 

«  Jo 

and 

Xn{t)  =  ni/2(y„(<)  -  at], 

vhere  0  <  t  <  1  and  n  >  1.  Now  we  make  the  basic  assumption  that  a  finite,  positive 
constant  <7  exists  such  that 

Xn  crB  as  n  — >  oo,  (4) 
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where  B  is  standard  Brownian  motion.  This  assumption  holds  for  a  wide  class  of  output 
processes.  To  find  the  scaling  process  {Z{t)  :  t  >  0}  consider  the  class  M  of  functions 
g  :  C[0, 1]  — »  iR  such  that 

(i)  g{ax)  =  ag{x)  for  all  a  >  0  and  x  €  C(0, 1]; 

(ii)  g{B)  >  0  with  probability  one; 

(iii)  g{x  -h  3k)  =  g{x)  for  all  real  $  and  x  6  C[0, 1],  where  k{t)  =  t; 

(iv)  P{B  €  D{g)}  =  0,  where  D{g)  is  the  set  of  discontinuities  of  g. 

The  process 


Sn{t)  =  0<(<1, 

is  called  a  standardized  time  series.  Using  weak  convergence  arguments  it  is  easy  to  show 
from  (4)  that 


5„(1)  =>  B(1)/<,(B) 


(5) 


as  n  —*  oo.  Unfolding  this  CLT  we  have  the  following  100(1  —  S)%  confidence  interval  for 

a: 

[y„(l)  -  z(l  -  6/2)g{Y^),Yn{\)  +  z{6/2)g{Yn)i 

where  P{B{l)/g{B)  <  ^(q;)}  =  q  for  0  <  a  <  1.  Thus  each  g  €  M  gives  rise  to  a 
confidence  interval  for  a  provided  we  can  find  the  distribution  of  B{l)/g{B).  Fortimately, 
this  can  be  done  for  a  number  of  interesting  g  functions. 

One  of  the  g  functions  leads  to  the  batch  means  method,  perhaps  the  most  popular 
method  for  steady-state  simulation.  We  conclude  our  discrission  of  the  method  of  stan¬ 
dardized  time  series  by  displaying  this  special  g  function.  To  this  end  we  first  define  the 
Brownian  bridge  mapping  T  :  C[0, 1]  — *  C[0, 1]  as 


{Tx){t)  =  xit)-tx{l),  xeC[0,l],  0<f<l 

Now  think  of  partitioning  our  original  output  process  Y  into  m  >  2  intervals  of  equal 
length  and  define  the  mapping  bm  •  ^[0, 1]  -♦  iR  by 


bm(x)  = 


^  '  1=1 
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for  I  6  C[0, 1].  Finzdly,  the  g  function  of  interest  is  gn  =  bmoT.  To  see  that  gm  corresponds 
to  the  batch  means  method  we  observe  that 


gmiYn)  =  rn 


=  rr,-l/2 


m 


1  I  1 

— x:u.(n)--5:z» 

•=1  \  ;=I 


n  >/2 


where 


fin/m 

Z.(n)=  /  Y{x)dxnn/m) 

J  («  — l)n/m 


is  the  ith  batch  mean  of  the  process  {K(t)  :  0  <  t  <  n}.  Specializing  (5)  to  the  function 
gm  we  see  that 

m 

1=1 

eis  n  — ♦  oo,  where  tm-i  is  a  Student ’s-<  random  variable  with  m  —  1  degrees  of  freedom. 
This  follows  from  the  fact  that  B{l)/gm{B)  is  distributed  as  tm-\  since  B  has  independent 
normal  increments.  For  other  examples  of  functions  y  6  Af  for  which  the  distribution  of 
B{\)lg{B)  is  known  see  GLYNN  and  IGLEHART  (1989). 


4.  Variance  Reduction  Techniques. 

Once  a  basic  method  is  developed  to  produce  point  estimates  aind  confidence  inter- 
\'als  for  a  parameter  of  interest,  we  turn  our  attention  to  making  these  methods  more 
efficient.  Over  the  years  a  dozen  or  more  techniques  have  been  proposed  to  improve  sim¬ 
ulation  efficiency.  Gcx)d  references  for  many  of  these  techniques  are  BRATLEY,  FOX, 
and  SCHRAGE  (1987),  WILSON  (1984).  Here  we  have  elected  to  outline  three  of  these 
techniques. 

As  we  have  seen  in  Sections  2  and  3,  confidence  intervals  for  parameters  being  es¬ 
timated  are  generally  constructed  from  associated  CLT.  Each  CLT  has  an  intrinsic 
variance  constant,  say,  (t\.  The  idea  for  many  variance  reduction  techniques  (VRT’s)  is  to 
modify  the  original  estimate  in  such  a  way  as  to  yield  a  new  CLT  with  a  variance  constant 
a\  <  af.  This  will,  of  course,  lead  to  confidence  intervals  of  shorter  length,  or  alterna¬ 
tively,  confidence  intervals  of  the  same  length  from  a  shorter  simulation  run.  Frequently 
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VRT’s  axe  based  on  some  analytic  knowledge  or  structural  properties  of  the  process  being 
simulated. 

The  first  VRT  we  discuss  is  known  as  importance  sBimpling.  This  idea  was  first 
developed  in  conjunction  with  the  estimation  of  E{h{X)}  =  a,  where  h  is  a  known  real- 
\’alued  function  and  X  a  random  variable  with  density,  say,  /.  Insteswl  of  sampling  X  from 
/.  we  sample  X  from  a  density  g  which  has  been  selected  to  be  large  in  the  regions  that 
are  “most  important”,  namely,  where  |/|  is  largest.  Then  we  estimate  a  by  the  sample 
mean  of  h{X)f{X)/giXy,  see  HAMMERSLEY  and  HANDSCOMB  (1964). 

This  same  basic  idea  can  be  carried  forward  to  the  estimation  of  parameters  associated 
with  stochastic  processes.  We  generate  the  process  with  a  new  probabilistic  structure  and 
estimate  a  modified  parameter  to  produce  an  estimate  of  the  original  quantity  of  interest. 
The  exzimple  we  consider  here  is  the  M/M/1  queue  with  arrival  rate  A,  service  rate  /i, 
and  traffic  intensity  p  =  X/p  <  1.  Let  V  denote  the  stationary  virtual  waiting  time  and 
consider  estimating  the  quantity  a  =  P{V’>u}for  large  u.  When  p  is  less  than  one,  the 
virtual  waiting  time  process  has  a  negative  drift  amd  an  impenetrable  barrier  at  zero.  Thus 
the  chance  of  the  process  getting  above  a  large  u  is  small,  and  a  long  simulation  would  be 
required  to  accurately  estimate  a.  The  idea  used  here  in  importance  sampling  is  to  generate 
a  so-called  conjugate  process  obtained  by  reversing  the  roles  of  A  and  p.  For  the  conjugate 
process  the  traffic  intensity  is  greater  than  one,  and  the  estimation  problem  becomes  much 
easier.  ASMUSSEN  (1985)  reports  efficiency  increases  on  the  order  of  a  fawjtor  of  3  to  a 
factor  of  400  over  straight  regenerative  simulation  depending  on  the  values  of  p  and  u.  In 
general,  importance  sampling  can  yield  very  significant  variance  reductions.  Fbrther  work 
along  these  lines  can  be  foxmd  in  SIEGMUND  (1976),  GLYNN  and  IGLEHART  (1989), 
SHAHABUDDm  <j1.  (1988),  and  WALRAND  (1987). 

The  second  VkT  we  discuss  is  known  as  indirect  estimation.  Assume  we  are  interested 
in  estimating  o  =  E{.\  },  but  happen  to  know  that  E{Y}  —  aE{X}  -f-  h  where  a  and  b  are 
known.  Sometimes  it  happens  that  a  CLT  associated  with  the  estimation  of  E{Y}  will  have 
a  smaller  variance  constant  associated  with  it  than  does  the  CLT  for  estimating  .E{A'}.  In 
this  case  we  would  prefer  to  estimate  .F{Y}  and  we  use  the  affine  transformation  above  to 
yield  an  estimate  for  E{X}.  This  idea  has  proved  to  be  useful  in  queuing  simulations  where 
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the  affine  transfonnation  is  a  result  of  Little’s  Law.  In  general,  variance  reductions  realized 
using  this  method  axe  not  dramatic,  being  usually  less  than  a  factor  of  2.  For  further 
results  along  these  lines,  see  LAW  (1975)  zmd  GLYNN  and  WHITT  (1986).  While  the 
affine  tramsformation  works  in  queuing  theory,  it  is  conceivable  that  other  tramsformations 
might  arise  in  different  contexts. 

The  third  aind  final  VRT  we  discuss  here  is  known  as  discrete  time  conversion.  Suppose 
that  X  =  {A'(#)  :  <  >  0}  is  an  irreducible,  positive  recurrent,  continuous  time  Mairkov 
chaun  (CTMC).  Then  X{t)  ^  X  aa  t  —*  oo  aind  we  may  be  interested  in  estimating 
a  =  E{f{X)},  where  /  is  a  given  real-valued  function.  As  we  have  discussed  above,  the 
regenerative  method  cam  be  used  to  estimate  a.  A  CTMC  hais  two  sources  of  ramdomness: 
the  embedded  discrete  time  jump  chaiin  and  the  exponential  holding  times  in  the  successive 
states  visited.  The  discrete  time  conversion  method  eliminates  the  ramdomness  due  to  the 
holding  times  by  replacing  them  by  their  expected  vadues.  It  hais  been  shown  that  this 
leads  to  a  variance  reduction  when  estimating  a.  Also,  as  an  abided  side  benefit  computer 
time  is  saved  since  the  exponential  holding  times  no  longer  need  to  be  generated.  Gauns  in 
efficiency  for  this  method  cam  be  substantial.  Further  discussion  of  this  idea  can  be  found 
in  HORDIJK,  IGLEHART,  and  SCHASSBERGER  (1976),  and  FOX  and  GLYNN  (1986). 

5.  System  Optimization  Using  Simulation. 

Consider  a  family  of  stochaistic  systems  indexed  by  a  parauneter  6  (perhaps  vector¬ 
valued).  Suppose  a(6)  is  our  performance  criterion  for  system  Otir  concern  here  is  to  find 
that  system,  say  ^oi  which  optimizes  the  value  of  a.  For  a  complex  system  it  is  frequently 
impossible  to  evaluate  a  analytically.  Simulation  may  be  the  most  attraM:tive  alternative. 
We  could  naively  simulate  the  systems  at  a  sequence  of  parameter  settings 
and  select  setting  that  optimizes  a(^,).  In  general  this  would  not  be  very  efficient,  since 
k  would  have  to  be  quite  large.  A  better  way  would  be  to  estimate  the  gradient  of  a  and 
use  this  estimate  to  establish  a  search  direction.  Then  stochastic  approximation  and  ideas 
from  non-linear  programming  could  be  used  to  optimize  or. 

Two  general  methods  have  been  proposed  to  estimate  gradients:  the  likelihood  ratio 
method  and  the  infinitesimal  perturbation  method.  We  will  discuss  both  methods  briefly. 
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Suppose  X  —  {X„  ;  n  >  0}  is  a  discrete  time  Markov  chain  (DTMC)  and  that  the  cost  of 
running  system  ^  for  n  +  1  steps  is  JVo,  ■  •  •  ,Xn)-  The  expected  cost  of  running  system 
9  is  then  given  by 

=  (6) 

where  is  expectation  relative  to  the  probability  measure  P{9)  associated  with  system  6. 
If  Ee{-}  were  independent  of  6,  we  would  simply  simulate  iid  replicates  of 
Vy(^,  Xo, . . . ,  JVn)-  By  introducing  the  likelihood  ftmction  L(5,  Xo, . . . ,  X„)  it  is  possi¬ 
ble  to  write  a{S)  sis 


a(d)  =  Ee, {g(e,  A^o,  •  •  • ,  Xn)L(d,  Xo,  •  •  • ,  Xn)} 


for  a  fixed  value  of  9q.  Then  we  can  write 

Va(9)  =  E,,  {Vg(9,  Xo,  •  •  • ,  Xn)L(9,  Xo,---,  Xn)}, 

where  the  interchange  of  V  and  E^^  must  be  justified.  A  similar  approach  can  be  developed 
to  estimate  the  gradient  of  a  performance  criterion  for  a  steady-state  simulation.  For  an 
overview  of  this  approach  see  GLYNN  (1987),  and  REIMAN  and  WEISS  (1986). 

The  second  method  which  has  been  proposed  for  estimating  gradients  is  called  the 
infinitesimal  perturbation  analysis  (IPA)  method.  In  this  method  a  derivative,  with  respect 
to  em  input  parameter,  of  a  simulation  sample  path  is  computed.  For  example,  we  might 
be  interested  in  estimating  the  mean  stationary  waiting  time  for  a  queueing  system  as  well 
2is  its  derivative  with  respect  to  the  mean  service  time.  Since  we  are  taking  a  derivative 
of  the  sample  path  inside  an  expectation  operator,  the  interchange  of  expectation  and 
differentiation  must  be  justified  in  order  to  produce  an  estimate  for  the  gradient  Va(^), 
say.  The  IPA  method  assumes  that  if  the  change  in  the  input  parameter,  6,  is  small 
enough,  then  the  times  at  which  events  occur  get  shifted  slightly,  but  their  order  does 
not  change.  It  has  been  shown  that  the  IPA  method  yields  strongly  consistent  estimates 
for  the  performance  gradient  in  a  variety  of  queueing  contexts;  see  HEIDELBERGER, 
CAO,  ZAZANIS,  and  SURI  (1988)  for  details  on  the  IPA  method  and  a  listing  of  queueing 
problems  for  which  the  technique  works. 
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